Skip to content

feat: migrate sitemap to App Router ISR with automation#367

Closed
amaan-bhati wants to merge 29 commits intomainfrom
dynamic-sitemap-app-router
Closed

feat: migrate sitemap to App Router ISR with automation#367
amaan-bhati wants to merge 29 commits intomainfrom
dynamic-sitemap-app-router

Conversation

@amaan-bhati
Copy link
Copy Markdown
Member

@amaan-bhati amaan-bhati commented Apr 10, 2026

Follow up of the dynamic sitemap using page router pr: #355

Summary

Replaces the Pages Router sitemap (pages/sitemap.xml.ts) and its custom crawler/retry/fallback machinery with a Next.js App Router ISR Route Handler that delegates caching entirely to Vercel's CDN.

  • app/sitemap.xml/route.ts — ISR Route Handler (revalidate = 3600). Returns application/xml directly via new Response(). After first generation, every request is served from Vercel CDN edge (<10ms, no Lambda invoked). If WordPress is down during a background regeneration cycle, Vercel automatically keeps serving the previous good cached version.
  • lib/api-server.ts — Server-only module using node:https directly instead of fetch(). Required because Next.js App Router wraps fetch() with RSC instrumentation that causes Cloudflare (in front of wp.keploy.io) to return 502 HTML. node:https bypasses this entirely, identical to what curl sends.
  • lib/sitemap.ts — Serializer utilities only (buildPostEntries, buildAuthorEntries, buildTagEntries, serializeSitemap, adaptPostsForSitemap, assertFullSitemap). No fetching, no retry logic.
  • pages/sitemap.xml.ts — Deleted. Conflicts with the App Router route at the same URL path.
  • pages/api/cron/refresh-sitemap.ts — Rewritten to GSC-submit only (~50 lines, maxDuration: 30). Sitemap generation is now ISR's job; the cron only tells Google the sitemap has been updated.
  • app/layout.tsx — Required by Next.js 14 when app/ directory is present. Must include <html><body> or the App Router runtime fails.
  • app/not-found.tsx — Redirects unmatched App Router paths to the Pages Router 404 page.
  • .github/workflows/prewarm-sitemap.yml — Triggered by Vercel deploy hook. Warms the ISR cache immediately after deploy so the first crawler never hits a cold Lambda.

What the review asked for → what we did

Review suggestion Status
Reuse lib/api.ts instead of custom crawler Done via lib/api-server.ts — same pagination logic, node:https transport to fix RSC 502
getStaticProps + revalidate instead of getServerSideProps Done — ISR Route Handler with revalidate = 3600
Vercel ISR stale-while-revalidate instead of 3-tier fallback Done — fallback machinery deleted
ISR deduplicates revalidation natively Done — concurrency guard deleted
maxDuration: 30 on cron Done
Cron only submits GSC Done

TypeScript fixes (side effect of adding app/ directory)

Next.js 14 automatically enables strictNullChecks: true in tsconfig.json when an app/ directory is created. This surfaced pre-existing implicit typing issues across several components. All fixes are type-annotation-only — no runtime behaviour changed.

Vercel Pricing: how sitemap ISR usage is counted

Vercel ISR (our GET /blog/sitemap.xml) is billed on these meters:

  • Function invocations + compute

    • A serverless function runs only when the sitemap needs to regenerate (after revalidate), not on every request.
    • Compute is measured as Active CPU + Provisioned Memory time. Network I/O (waiting on WordPress) generally contributes far less to Active CPU than to wall time.
  • ISR durable cache writes (8KB units)

    • When regeneration produces new output, Vercel persists it to durable ISR storage.
    • Usage is counted in 8KB units.
    • If a regeneration produces identical output, Vercel does not charge write units for it.
  • ISR durable cache reads (8KB units)

    • Counted when the CDN edge doesn’t have the cached sitemap and Vercel reads it from durable ISR storage.
    • Most requests should be served from the CDN edge cache, so durable reads are typically low.
  • Cron

    • Cron scheduling is included, but each cron run is still a normal function invocation.
    • Our cron endpoint only submits to Google Search Console; it does not regenerate the sitemap.

What our implementation costs (expected)

Sitemap config

  • revalidate = 3600 (hourly)
  • 450 community + ~28 tech posts (478 post URLs) + static routes + derived author/tag URLs
  • Publish cadence: 1 post every 2 days (15 posts/month)

Expected monthly usage attributable to the sitemap

  • Regenerations that actually change output: typically ~15/month to 22/month (aligned with publishing cadence).
  • Worst-case regeneration frequency: up to ~720/month if the sitemap is requested around/after every TTL expiry, but most will be identical output and won’t incur ISR write units.
  • ISR write units: typically hundreds to low-thousands 8KB units/month, since writes mainly happen when content changes.
  • Function compute: typically very low; most requests are served from CDN cache and regeneration is mostly network I/O + string building.
  • Incremental $ cost on Vercel Pro: effectively ~$0/month in normal conditions (should remain within Pro included quotas).

Sources

[S1] Vercel Docs — ISR Usage and Pricing (8KB units, reads/writes, durable vs CDN, identical output note)
https://vercel.com/docs/pricing/incremental-static-regeneration

[S2] Vercel Docs — Fluid compute pricing (Active CPU billed only during execution; pauses during I/O)
https://vercel.com/docs/functions/usage-and-pricing/

[S3] Vercel Docs — Cron Jobs: Usage & Pricing (cron jobs invoke functions; cron included; function pricing applies)
https://vercel.com/docs/cron-jobs/usage-and-pricing

Summary table

Area Earlier approach (pre-review direction) Current implementation (this PR) Review on prev pr
Serving route getServerSideProps / per-request generation App Router route: app/sitemap.xml/route.ts Pages route: pages/sitemap.xml.ts
Caching/fallback Custom 3-tier (memory → /tmp → static) Platform-managed via ISR + CDN; explicit 503 fallback only when no good cache exists Platform-managed via ISR + CDN
Regeneration Cron-driven refresh + concurrency guards revalidate = 3600 (ISR); regen on access after TTL revalidate = 3600 (ISR)
WordPress data fetch Custom crawler + retries Server-only crawl in lib/api-server.ts (minimal fields) Reuse lib/api.ts helpers
Cron responsibility Refresh sitemap + GSC submission GSC submission only GSC submission only
Reliability on deploy Fragile (memory + /tmp reset on deploy) Strong once first sitemap is generated and cached (ISR handles stale serving) Strong (ISR)
Maintenance surface High (duplicate data layer + caching logic) Moderate (sitemap builders + server-only fetcher) Lowest (reuse lib/api.ts)

Few things we did not implement exactly as suggested:

  • We did not follow the exact “getStaticProps + reuse lib/api.ts” sketch because generating the sitemap in the App Router context via the global fetch() path (used by lib/api.ts) was returning 502 HTML from Cloudflare/WP in production, i tried doing this locally as well and it was throwing an error on the terminal.
  • To keep sitemap generation reliable, we use a small server-only GraphQL transport in lib/api-server.ts (raw node:http/https) which bypasses Next App Router fetch instrumentation.

Why lib/sitemap.ts still exists

lib/sitemap.ts is not a parallel WordPress crawler anymore. It is the sitemap builder module:

  • What it contains: static route list, post→URL mapping (posts/authors/tags), XML escaping + serialization, URL dedupe, and a “partial data” guard (assertFullSitemap) to avoid caching incomplete crawls.
  • What it does not do: it does not fetch WordPress data, implement retry loops, maintain in-memory /tmp caches, or run cron-based regeneration. WordPress crawling/pagination for sitemap happens in lib/api-server.ts only.

amaan-bhati and others added 4 commits April 10, 2026 20:27
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Adds a GitHub Actions workflow triggered by a Vercel deploy hook that
hits /blog/sitemap.xml immediately after deployment, warming the ISR
cache so the first real user or crawler never hits a cold Lambda.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 10, 2026 15:01
… env

- lib/google-search-console.ts: OAuth2 JWT flow for GSC submission,
  required by pages/api/cron/refresh-sitemap.ts
- scripts/submit-sitemap-to-search-console.mjs: local dev script to
  manually submit sitemap to GSC without deploying
- vercel.json: add cron schedule (daily midnight), add missing redirect,
  fix CSP header source regex to exclude sitemap.xml and /api/ paths
- next.config.js: exclude sitemap.xml from Next.js CSP headers to keep
  both layers consistent with vercel.json
- playwright.config.ts: inject CRON_SECRET=test-secret so e2e cron
  tests can authenticate against the local dev server

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Migrates sitemap generation from the Pages Router implementation to a Next.js App Router ISR route handler, relying on Vercel CDN caching and simplifying the cron job to only notify Google Search Console.

Changes:

  • Add app/sitemap.xml/route.ts ISR route handler and new lib/api-server.ts (node:https) for server-side WPGraphQL access.
  • Refactor lib/sitemap.ts into pure sitemap adaptation/serialization utilities and remove the Pages Router sitemap endpoint.
  • Update cron handler and apply TypeScript strict-null-check fixes across several components/pages triggered by adding app/.

Reviewed changes

Copilot reviewed 20 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tsconfig.json Enables Next TS plugin + strictNullChecks, updates include paths for App Router types.
pages/technology/[slug].tsx Fixes useRef initialization and types a previously implicit array.
pages/community/[slug].tsx Fixes useRef initialization and types a previously implicit array.
pages/authors/[slug].tsx Types a previously implicit array to satisfy strict null checks.
pages/api/cron/refresh-sitemap.ts Simplifies cron endpoint to auth + optional GSC submission only.
lib/sitemap.ts Introduces pure sitemap utilities (adapt/build/serialize) and fallback XML generator.
lib/api.ts Tightens API URL config handling; expands getAllPosts query to include modified, categories/tags, and pageInfo pagination.
lib/api-server.ts Adds server-only WPGraphQL client via node:https and a sitemap-focused pagination fetch.
components/TableContents.tsx Fixes timeout ref typing and replaceState signature for strict null checks.
components/post-body.tsx Adds explicit state typings for strict null checks.
components/NotFoundPage.tsx Avoids optional-chaining pitfalls under strict null checks when slicing edges.
components/more-stories.tsx Types the error state as `string
components/AuthorMapping.tsx Adds explicit array typings for strict null checks.
app/sitemap.xml/route.ts New ISR sitemap endpoint at /sitemap.xml (under basePath) with fallback behavior.
app/not-found.tsx Adds App Router not-found boundary redirecting to Pages Router 404.
app/layout.tsx Adds required App Router root layout with <html>/<body>.
.github/workflows/prewarm-sitemap.yml Adds deploy-triggered workflow to warm the ISR sitemap cache.
pages/sitemap.xml.ts (deleted) Removes Pages Router sitemap to avoid path conflict with App Router route.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/api-server.ts
Comment thread lib/sitemap.ts Outdated
amaan-bhati and others added 2 commits April 10, 2026 20:45
Cover the full ISR sitemap flow:
- Sitemap.spec.ts: status 200, Content-Type xml, correct s-maxage=3600 ISR
  cache headers, valid urlset structure, static routes presence, dynamic post
  count, lastmod dates, changefreq, CSP exclusion, and URL deduplication
- RefreshSitemapCron.spec.ts: 401 without auth, 401 with wrong secret, 405 for
  non-GET (method not leaked to unauthenticated callers), 200 skipped response
  when GSC env vars are absent in test env, and no-store cache-control check

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Next.js raises a build error when a static file in public/ and a route
handler exist at the same path. The ISR route handler in
app/sitemap.xml/route.ts supersedes this stale 2024 static file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 10, 2026 15:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 25 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/api-server.ts Outdated
Comment thread lib/api-server.ts
Comment thread lib/sitemap.ts
Comment thread .github/workflows/prewarm-sitemap.yml Outdated
amaan-bhati and others added 2 commits April 10, 2026 21:13
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
- tests/mock-server.js: add AllPostsForSitemap handler before generic AllPosts
  guard, returning combined tech+community edges so assertFullSitemap sees posts
  in both categories (previously communityCount=0 caused 503 in every sitemap
  e2e test)
- pages/api/cron/refresh-sitemap.ts: set Cache-Control: no-store explicitly in
  the handler so the test assertion is framework-independent (vercel.json headers
  are a Vercel platform layer, not applied by the local Next.js server)
- lib/api-server.ts: replace allEdges=[...allEdges,...edges] in pagination loop
  with allEdges.push(...edges) to avoid O(n²) array allocation across pages
- lib/sitemap.ts: guard parseInt result with Number.isNaN so an invalid
  SITEMAP_MIN_POSTS_PER_CATEGORY env var falls back to 5 instead of silently
  making the assertion unreachable (NaN < 5 === false)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 28 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/sitemap.ts Outdated
Comment thread lib/api-server.ts
Comment thread pages/api/cron/refresh-sitemap.ts Outdated
Comment thread vercel.json
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Copilot AI review requested due to automatic review settings April 10, 2026 16:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 28 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

vercel.json:60

  • The /blog/((?!(?:sitemap\.xml$|api/|_next/static/)).*) headers rule still matches /blog/_next/image and will apply the CSP and the generic Cache-Control to Next’s image optimizer responses, potentially overriding Next’s own long-lived image caching (notably you’ve set images.minimumCacheTTL to 1 year in next.config.js). Consider excluding _next/image (and any other non-HTML asset paths you rely on) from this rule so image requests keep their intended cache headers.
    {
      "source": "/blog/((?!(?:sitemap\\.xml$|api/|_next/static/)).*)",
      "headers": [
        {
          "key": "Content-Security-Policy",
          "value": "connect-src 'self' https://px.ads.linkedin.com https://www.google-analytics.com https://analytics.google.com https://region1.google-analytics.com https://stats.g.doubleclick.net https://rp.liadm.com https://idx.liadm.com https://pagead2.googlesyndication.com https://*.clarity.ms https://news.google.com https://assets.apollo.io https://wp.keploy.io https://cdn.hashnode.com https://keploy-websites.vercel.app https://blog-website-phi-eight.vercel.app https://docbot.keploy.io https://www.youtube.com https://youtube.com https://www.youtube-nocookie.com https://*.youtube.com https://*.googlevideo.com https://googleads.g.doubleclick.net https://marketplace.visualstudio.com https://api.github.com https://pro.ip-api.com https://api.vector.co https://aplo-evnt.com https://ep1.adtrafficquality.google https://ppptg.com https://telemetry.keploy.io; frame-src 'self' https://www.googletagmanager.com https://keploy-websites.vercel.app https://blog-website-phi-eight.vercel.app https://docbot.keploy.io https://www.youtube.com https://youtube.com https://www.youtube-nocookie.com https://*.youtube.com https://news.google.com https://googleads.g.doubleclick.net https://*.google.com https://ppptg.com; img-src 'self' https://c.bing.com https://ppptg.com https://pbs.twimg.com https://secure.gravatar.com https://wp.keploy.io https://keploy.io data:;"
        },

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/e2e/Sitemap.spec.ts Outdated
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 28 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/prewarm-sitemap.yml Outdated
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 31 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/api-server.ts
Comment thread components/navbar/FloatingNavbar.tsx Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 30 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/not-found.tsx
Comment thread lib/api-server.ts Outdated
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 30 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 29 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/not-found.tsx Outdated
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 29 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/api.ts
Comment thread lib/api.ts Outdated
Comment thread lib/api.ts Outdated
Comment thread vercel.json
…y guards

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 29 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread playwright.config.ts Outdated
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 29 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread playwright.config.ts
Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 29 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@amaan-bhati amaan-bhati changed the title feat: migrate sitemap to App Router ISR (node:https, no custom crawler) feat: migrate sitemap to App Router ISR with automation Apr 13, 2026
@amaan-bhati
Copy link
Copy Markdown
Member Author

clsoing this pr since we found a cheaper and more optimised + faster approach in this pr: #374

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants